ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group98c.txt / 000012_icon-group-sender _Fri Sep 11 08:21:25 1998.msg < prev next >

Wrap

Internet Message Format | 2000-09-20 | 4KB

Return-Path: <icon-group-sender> Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239]) by baskerville.CS.Arizona.EDU (8.9.1a/8.9.1) with SMTP id IAA24270 for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 11 Sep 1998 08:21:24 -0700 (MST) Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM) id AA31078; Fri, 11 Sep 1998 08:20:57 -0700 To: icon-group@optima.CS.Arizona.EDU Date: Fri, 11 Sep 1998 09:22:34 +0900 From: Eric Hildum <Eric.Hildum@japan.ncr.com> Message-Id: <35F86D49.9BDF7813@Japan.NCR.COM> Organization: NCR Japan Sender: icon-group-request@optima.CS.Arizona.EDU References: <35F723CF.76B3CC97@Japan.NCR.COM>, <6t9b4o$8rs$1@ringer.cs.utsa.edu> Subject: Re: Unicode support or support for non-Ascii based character manipulation? Errors-To: icon-group-errors@optima.CS.Arizona.EDU Status: RO Clinton Jeffery wrote: > Eric Hildum (Eric.Hildum@Japan.NCR.COM) wrote (and I paraphrase/edited): > : Icon ... supporting only ASCII makes it less useful for non-English language > : With Unicode... it should be possible to begin including support for > : non-English and non alphabetic languages. > > : Has anyone thought about this yet? What does string and pattern matching > : mean in, for example, Japanese? > > 1. Other folks have been thinking about it, especially Icon users in Asia. > For example, a Chinese version of Icon has been done by researchers in China. Glad to hear it. > > > 2. Going to Unicode might not be *that* difficult, but I think Unicode isn't > really as widely adopted as you might suggest. Many people seem to be using > mixed 8/16-bit strings. Windows NT, Macintosh, use Unicode. Unix is still EUC. > > > 3. The semantics of string and pattern matching are no different in Japanese > than in English. There is nothing specific to language or grammar in the Icon > string and pattern matching repertoire. Of course, when the character set > changes the actual code needs to change... That surprises me. Given the above comment about mixed 8/16 bit, I would expect you already would have run into the half width/full width character issue. How did you handle it? > > > 4. Let's look at the current situation for mixed-character sets. I am not > sure how Chinese Icon stands on these, but consider plain-old Windows Icon. > Divide functionality as follows: > non-alphabetic output: Windows Icon already can do this > non-alphabetic input: we have known bugs in the input processing > of these, either in Windows Icon or the IPL "vidgets" code. > non-alphabetic string scanning: not supported, but could be > implemented as Icon Program Library procedures. Even > Unicode string semantics could be implemented as library > procedures on top of (even length!) Icon strings. Chinese is probably the easiest double byte language to support. I don't think you have really considered or solved all the problems until you can support Japanese (for representation and manipulation) and Korean (for I/O). > > > We don't really need much additional infrastructure. Some folks in the user > community could coordinate the library procedures to do this as an > interesting project. We do also need someone who can compile Icon from its > C code and debug I/O problems on a non-alphabetic platform at this point. "non-alphabetic platform" hmmm, you haven't got any Chinese or Japanese grad students on the Icon project have you... > > > -- > Clint Jeffery, jeffery@cs.utsa.edu > Division of Computer Science, The University of Texas at San Antonio > Research http://www.cs.utsa.edu/research/plss.html -- --------------------------- Eric Hildum Eric.Hildum@Japan.NCR.COM